BINNED_STATISTIC

Overview

The BINNED_STATISTIC function computes summary statistics for data grouped into bins along a numeric axis. This is a generalization of a histogram: while a histogram counts the number of observations in each bin, BINNED_STATISTIC allows computation of mean, sum, median, standard deviation, minimum, maximum, or count for values falling within each bin.

This implementation wraps scipy.stats.binned_statistic from the SciPy library. The function takes two parallel arrays: an independent variable x that determines bin assignment, and a dependent variable values on which the statistic is computed. Data points are assigned to bins based on their x values, and the selected statistic is calculated from the corresponding values.

The binning process divides the range of x into equal-width intervals. By default, the range spans from the minimum to maximum values in x, but custom bounds can be specified using the lowerlimit and upperlimit parameters. Values of x outside the specified range are ignored. All bins except the rightmost are half-open intervals [a, b), meaning they include the left edge but exclude the right edge. The rightmost bin [a, b] includes both edges.

Supported statistics include:

  • mean: arithmetic mean of values in each bin (empty bins return NaN)
  • sum: sum of values, equivalent to a weighted histogram
  • median: median value in each bin
  • count: number of observations, equivalent to an unweighted histogram
  • std: standard deviation (calculated with \text{ddof}=0)
  • min / max: minimum or maximum value in each bin

For more information, see the SciPy binned_statistic documentation and the SciPy GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=BINNED_STATISTIC(x, values, bs_statistic, bins, lowerlimit, upperlimit)
  • x (list[list], required): Data to bin (independent variable).
  • values (list[list], required): Data to compute statistics on (dependent variable).
  • bs_statistic (str, optional, default: “mean”): Statistic to compute.
  • bins (int, optional, default: 10): Number of bins.
  • lowerlimit (float, optional, default: null): Lower bound for the bin range.
  • upperlimit (float, optional, default: null): Upper bound for the bin range.

Returns (list[list]): 2D list of bin statistics, or error message string.

Examples

Example 1: Mean statistic in 2 bins

Inputs:

x values bs_statistic bins
1 1 mean 2
1 1
2 2
5 1.5
7 3

Excel formula:

=BINNED_STATISTIC({1;1;2;5;7}, {1;1;2;1.5;3}, "mean", 2)

Expected output:

Result
1.3333
2.25

Example 2: Sum statistic in 2 bins

Inputs:

x values bs_statistic bins
1 1 sum 2
1 1
2 2
5 1.5
7 3

Excel formula:

=BINNED_STATISTIC({1;1;2;5;7}, {1;1;2;1.5;3}, "sum", 2)

Expected output:

Result
4
4.5

Example 3: Median with custom range

Inputs:

x values bs_statistic bins lowerlimit upperlimit
1 10 median 3 1 7
2 20
3 30
4 40
5 50
6 60
7 70

Excel formula:

=BINNED_STATISTIC({1;2;3;4;5;6;7}, {10;20;30;40;50;60;70}, "median", 3, 1, 7)

Expected output:

Result
15
35
60

Example 4: Count statistic in 2 bins

Inputs:

x values bs_statistic bins
1 10 count 2
2 20
3 30
4 40
5 50

Excel formula:

=BINNED_STATISTIC({1;2;3;4;5}, {10;20;30;40;50}, "count", 2)

Expected output:

Result
2
3

Python Code

from scipy.stats import binned_statistic as scipy_binned_statistic
import math

def binned_statistic(x, values, bs_statistic='mean', bins=10, lowerlimit=None, upperlimit=None):
    """
    Computes a binned statistic (mean, sum, median, etc.) for the input data.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): Data to bin (independent variable).
        values (list[list]): Data to compute statistics on (dependent variable).
        bs_statistic (str, optional): Statistic to compute. Valid options: Mean, Sum, Median, Count, Std, Min, Max. Default is 'mean'.
        bins (int, optional): Number of bins. Default is 10.
        lowerlimit (float, optional): Lower bound for the bin range. Default is None.
        upperlimit (float, optional): Upper bound for the bin range. Default is None.

    Returns:
        list[list]: 2D list of bin statistics, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    # Convert scalars to 2D lists if needed
    x = to2d(x)
    values = to2d(values)
    # Flatten 2D lists to 1D
    try:
        flat_x = [float(item) for row in x for item in (row if isinstance(row, list) else [row])]
        flat_values = [float(item) for row in values for item in (row if isinstance(row, list) else [row])]
    except Exception:
        return "Invalid input: x and values must be 2D lists or scalars of numbers."
    if len(flat_x) == 0 or len(flat_values) == 0:
        return "Invalid input: x and values must not be empty."
    if len(flat_x) != len(flat_values):
        return "Invalid input: x and values must have the same length."
    if bs_statistic not in ('mean', 'sum', 'median', 'count', 'std', 'min', 'max'):
        return "Invalid input: bs_statistic must be one of 'mean', 'sum', 'median', 'count', 'std', 'min', 'max'."
    try:
        nbins = int(bins)
    except Exception:
        return "Invalid input: bins must be an integer."
    if nbins < 1:
        return "Invalid input: bins must be >= 1."
    # Prepare range
    range_arg = None
    if lowerlimit is not None and upperlimit is not None:
        try:
            range_arg = (float(lowerlimit), float(upperlimit))
        except Exception:
            return "Invalid input: lowerlimit and upperlimit must be numbers."
    try:
        if range_arg:
            res = scipy_binned_statistic(flat_x, flat_values, statistic=bs_statistic, bins=nbins, range=range_arg)
        else:
            res = scipy_binned_statistic(flat_x, flat_values, statistic=bs_statistic, bins=nbins)
        stat = res.statistic
    except Exception as e:
        return f"scipy.stats.binned_statistic error: {e}"
    # Return as 2D list (column vector), converting NaN to None for Excel compatibility
    result = []
    for val in stat:
        if val is None or (isinstance(val, float) and math.isnan(val)):
            result.append([""])
        else:
            result.append([float(val)])
    return result

Online Calculator